Pd support #94

mayabar · 2025-07-14T13:12:16Z

No description provided.

…ecode fields Signed-off-by: Maya Barnea <mayab@il.ibm.com>

Update config_test to use a function to create configuration same as defined in the config yaml file Signed-off-by: Maya Barnea <mayab@il.ibm.com>

change command line argument name to 'kv-cache-transfer-latency' Signed-off-by: Maya Barnea <mayab@il.ibm.com>

irar2 · 2025-07-15T05:22:22Z

pkg/llm-d-inference-sim/config_test.go

 	)

 	DescribeTable("invalid configurations",
 		func(args []string) {
 			_, err := createSimConfig(args)
 			Expect(err).To(HaveOccurred())
 		},
-		Entry(tests[7].name, tests[7].args),


You removed a test here instead of only increasing the indices

added test 13

irar2 · 2025-07-15T05:25:04Z

pkg/llm-d-inference-sim/config_test.go

+
+	return c
+}
+


I think this is confusing, and one function is enough. There is only one case for createDefaultBasicConfig, and we can just update the lora parameters in the test.

basicConfig function was removed

irar2 · 2025-07-15T05:29:19Z

pkg/llm-d-inference-sim/request.go

+	// isDoRemoteDecode() returns true is do_remote_decode is true in the request, this means that this is prefill request
+	doRemoteDecode() bool
+	// isDoRemotePrefill() returns true is do_remote_prefill is true in the request, this means that this is decode request
+	doRemotePrefill() bool


The names in the comments don't match the actual names

irar2 · 2025-07-15T05:30:24Z

pkg/llm-d-inference-sim/request.go

+	RemoteBlockIds  []string `json:"remote_block_ids"`
+	RemoteEngineId  string   `json:"remote_engine_id"`
+	RemoteHost      string   `json:"remote_host"`
+	RemotePort      int      `json:"remote_port"`


Consider adding comments for the fields

irar2 · 2025-07-15T05:31:04Z

pkg/llm-d-inference-sim/response.go

+	RemoteBlockIds  []string `json:"remote_block_ids"`
+	RemoteEngineId  string   `json:"remote_engine_id"`
+	RemoteHost      string   `json:"remote_host"`
+	RemotePort      int      `json:"remote_port"`


irar2 · 2025-07-15T05:31:59Z

pkg/llm-d-inference-sim/simulator.go

@@ -304,6 +306,8 @@ func (s *VllmSimulator) readRequest(ctx *fasthttp.RequestCtx, isChatCompletion b
 	var req textCompletionRequest

 	err := json.Unmarshal(ctx.Request.Body(), &req)
+
+	fmt.Printf("Unmarshaled text request: %#v\n", req)
 	return &req, err


Debug printing?

irar2 · 2025-07-15T05:35:48Z

pkg/llm-d-inference-sim/simulator.go

@@ -477,16 +491,23 @@ func (s *VllmSimulator) reqProcessingWorker(ctx context.Context, id int) {
 							isChatCompletion: reqCtx.isChatCompletion,
 							model:            displayModel,
 						},
-						responseTokens, toolCalls, finishReason, usageDataToSend,
+						responseTokens, toolCalls, finishReason, usageDataToSend, req.doRemotePrefill(),


Maybe add doRemotePrefill to streamingContext?

good idea, added

irar2 · 2025-07-15T05:37:26Z

pkg/llm-d-inference-sim/simulator.go

@@ -638,6 +671,14 @@ func (s *VllmSimulator) sendResponse(isChatCompletion bool, ctx *fasthttp.Reques
 	s.responseSentCallback(modelName)
 }

+// returns time to first token based on whether


on whether what? :)

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

shmuelk · 2025-07-15T09:37:16Z

README.md

@@ -29,9 +29,9 @@ The simulator supports two modes of operation:
 - `echo` mode: the response contains the same text that was received in the request. For `/v1/chat/completions` the last message for the role=`user` is used.
 - `random` mode: the response is randomly chosen from a set of pre-defined sentences.

-Timing of the response is defined by two parameters: `time-to-first-token` and `inter-token-latency`. 
+Timing of the response is defined by `time-to-first-token` and `inter-token-latency` parameters. In case P/D is enabled for a request, `kv-cache-transfer-latency` will be used instead of `time-to-first-token`.


replace: is defined by `time-to-first-token

with: is defined by the time-to-first-token

shmuelk · 2025-07-15T09:39:37Z

README.md


-For a request with `stream=true`: `time-to-first-token` defines the delay before the first token is returned, `inter-token-latency` defines the delay between subsequent tokens in the stream. 
+For a request with `stream=true`: `time-to-first-token` or `kv-cache-transfer-latency` defines the delay before the first token is returned, `inter-token-latency` defines the delay between subsequent tokens in the stream. 

 For a requst with `stream=false`: the response is returned after delay of `<time-to-first-token> + (<inter-token-latency> * (<number_of_output_tokens> - 1))`


Why wasn't kv-cache-transfer-latency mentioned here?

shmuelk · 2025-07-15T09:54:46Z

pkg/llm-d-inference-sim/simulator.go

@@ -616,8 +649,8 @@ func (s *VllmSimulator) createCompletionResponse(isChatCompletion bool, respToke
 // finishReason - a pointer to string that represents finish reason, can be nil, stop, length, or tools
 // usageData - usage (tokens statistics) for this response
 func (s *VllmSimulator) sendResponse(isChatCompletion bool, ctx *fasthttp.RequestCtx, respTokens []string, toolCalls []toolCall,
-	modelName string, finishReason string, usageData *usage) {
-	resp := s.createCompletionResponse(isChatCompletion, respTokens, toolCalls, &finishReason, usageData, modelName)
+	modelName string, finishReason string, usageData *usage, doRemoteDecode bool, doRemotePrefill bool) {


The names doRemoteDecode and doRemotePrefill are strange. Shouldn't they be something like doPrefillOnly and doDecodeOnly, respectively?

do-remote-decode and do-remote-prefill are field names of vLLM's request/response

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

mayabar added 3 commits July 14, 2025 16:08

Add P/D support, respond accordingly to doRemotePrefill and doRemoteD…

10d5088

…ecode fields Signed-off-by: Maya Barnea <mayab@il.ibm.com>

Add test for kvcache transfer time command line parameter.

276f15a

Update config_test to use a function to create configuration same as defined in the config yaml file Signed-off-by: Maya Barnea <mayab@il.ibm.com>

Update readme file

626c432

change command line argument name to 'kv-cache-transfer-latency' Signed-off-by: Maya Barnea <mayab@il.ibm.com>

mayabar requested review from shmuelk and irar2 July 14, 2025 13:12

irar2 requested changes Jul 15, 2025

View reviewed changes

fixes according PR's comments

98b9585

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

mayabar requested a review from irar2 July 15, 2025 06:17

added comments for fields

651c3b0

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

irar2 previously approved these changes Jul 15, 2025

View reviewed changes

fix utils_test - initialize random before

d061f72

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

mayabar dismissed irar2’s stale review via d061f72 July 15, 2025 07:20

irar2 previously approved these changes Jul 15, 2025

View reviewed changes

shmuelk requested changes Jul 15, 2025

View reviewed changes

fixes in readme according the PR review

3c699b8

Signed-off-by: Maya Barnea <mayab@il.ibm.com>

mayabar dismissed irar2’s stale review via 3c699b8 July 15, 2025 10:02

shmuelk approved these changes Jul 15, 2025

View reviewed changes

irar2 approved these changes Jul 15, 2025

View reviewed changes

mayabar merged commit 996dae3 into llm-d:main Jul 15, 2025
2 checks passed

Pd support #94

Pd support #94

Uh oh!

Conversation

mayabar commented Jul 14, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!